ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION by

نویسنده

  • Jidong Tao
چکیده

ACOUSTIC MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND ANIMAL VOCALIZATION CLASSIFICATION Jidong Tao, B.Eng., M.S. Marquette University, 2009 Automatic speech recognition (ASR) converts human speech to readable text. Acoustic model adaptation, also called speaker adaptation, is one of the most promising techniques in ASR for improving recognition accuracy. Adaptation works by tuning a general purpose acoustic model to a specific one according to the person who is using it. Speaker adaptation can be categorized by Bayesian-based, transformation-based and model combination-based methods. Model combination-based speaker adaptation has been shown to have an advantage over the traditional Bayesian-based and transformation-based adaptation methods when the amount of adaptation speech is as small as a few seconds. However, model combination-based rapid speaker adaptation has not been widely used in practical applications since it requires large amounts of speaker-dependent (SD) training data from multiple speakers. This research proposes a new technique, eigen-clustering, to eliminate the need for large quantities of speaker-labeled training utterances so that model combination-based adaptation can be started from much more inexpensive speakerindependent (SI) data. Based on principal component analysis (PCA), this technique constructs an eigenspace using each utterance in the training set. This proposed adaptation method can not only improve human speech recognition directly, but also contribute to animal vocalization analysis and behavior studies potentially. Application to the field of bioacoustics is especially meaningful because the amount of collected animal vocalization data is often limited and therefore fast adaptation methods are naturally suitable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic model adaptation for ortolan bunting (Emberiza hortulana L.) song-type classification.

Automatic systems for vocalization classification often require fairly large amounts of data on which to train models. However, animal vocalization data collection and transcription is a difficult and time-consuming task, so that it is expensive to create large data sets. One natural solution to this problem is the use of acoustic adaptation methods. Such methods, common in human speech recogni...

متن کامل

Acoustic censusing using automatic vocalization classification and identity recognition.

This paper presents an advanced method to acoustically assess animal abundance. The framework combines supervised classification (song-type and individual identity recognition), unsupervised classification (individual identity clustering), and the mark-recapture model of abundance estimation. The underlying algorithm is based on clustering using hidden Markov models (HMMs) and Gaussian mixture ...

متن کامل

Automatic Type Classification and Speaker Identification of African Elephant Vocalizations

This paper presents systems for automatically classifying elephant vocalizations by type and for identifying the speaker of a given vocalization. The method applies techniques from the speech processing field, with modifications, to elephant vocalizations. The features used for classification are 12 Mel-Frequency Cepstral Coefficients computed using a chirp Z-transform to interpolate among the ...

متن کامل

Hidden Markov Model Based Animal Acoustic Censusing: Learning from Speech Processing Technology

Individually distinct acoustic features have been observed in a wide range of vocally active animal species and have been used to study animals for decades. Only a few studies, however, have attempted to examine the use of acoustic identification of individuals to assess population, either for evaluating the population structure, population abundance and density, or for assessing animal seasona...

متن کامل

Automatic Frame Length, Frame Overlap and Hidden Markov Model Topology for Speech Recognition of Animal Vocalizations

Preface Automatic Speech Recognition (ASR) is a useful tool that can facilitate the research and study of animal vocalizations. The use of human speech-based signal processing techniques for animal vocalizations has several pitfalls. Animal vocalizations may not share the same spectral or temporal characteristics as human speech. As a result , the typical ASR assumptions concerning the best fra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009